This work is submitted in partial fulfillment of requirements for the Developing Data Products massively open online course (MOOC), component of the Data Science Specialization, hosted by Johns Hopkins University. The data at hand are the products of a team of journalists at the Washington Post. Detailed background and citation of data sources, along with considerations for reproducibility of my transformation from raw to tidy data, are available in the accompanying final.Rpres document in my project Github repo. Throughout my emphasis will be on reproducibility of my findings.
The unjust practice of enslaving humans for forced labor has been tragically with us since the beginning of recorded history. The history of colonization of the American continent is no exception. This analysis was mindfully begun during the Dr. Martin Luther King, Jr., holiday weekend of 2022, and was completed and is submitted in mindful anticipation of observance of Black History Month 2022 in the USA. The twofold purpose of this submission is to (1) demonstrate how open-source software and public-domain data can be used to answer important questions and (2) offer my perspective on an important and timely issue as a scientist and native of the American Deep South.
Readers will find my tidy data posted in my Github repo. Readers who are fluent in R and wish to consider reproducibility are welcome to fork my repo and proceed; all I ask is that you inform me that you have done so, and please feel free to push your work and share it! Please also go to the Washington Post source publication and data (linked in my .Rpres document found on my Github repo). This project is a splendid example of scholarly journalism, among many others hosted by this newspaper. You may share this work with students, trainees, friends, family, congregations, etc. as you see fit with proper attribution.
The USA is governed at the federal level, and the states are represented, by three branches: executive, legislative, and judicial. The legislative branch comprises two houses of Congress: the Senate and the House of Representatives. Senators are elected to a term of six years and Representatives are elected to a term of two years. Each Congress runs for two years. Our history of congressional representation begins in the year 1789 with the first Congress and continues to the present with the 117th Congress. As our Union has grown from the original thirteen colonies to the current fifty states, the District of Columbia, and several territories, and as colonization has grown generally from east to west, historical representation (proportional to population and to longevity within the Union) has become very large in the New England and mid-Atlantic regions and sparse in the plains and west coast.
The raw data curated by the Washington Post team require preprocessing and tidying (see my .md file in my Github repo for relevant code).
The map below will orient the reader to a bird’s-eye view of the cumulative historic representation in Congress by members who either held slaves, did not hold slaves, or whose status on slave ownership is unknown. The appearance of the map is the result of my compromise of cartographic best practices with my intent to convey a message. The range of state-level Congressional representation is from 70 to 2602 member-Congresses. This range cannot be symbolized satisfactorily on a map without mathematical transformation. For this reason I employed a square-root transformation and divided the square root by 25. This preserves the visual impression that states to the East have high historical representation and states to the West do not. Next, I symbolized each state’s magnitude of historical Congressional representation as the radius of a state-level pie chart and the proportion of that representation who held slaves as the angle of the red-colored wedge within each pie. Finally, I labelled each state by its postal service abbreviation and repelled the label so that the extent of the red wedges would not be obscured. The reader will note that mid-Atlantic, mid-South and Deep South states have the largest red wedges. The hopeful result I intend is that all readers, regardless of familiarity with US history and emergent geography, will understand which states have or have not had high levels of historical Congressional representation with slaveholder sensibilities.
## [1] 70 2602
## [1] 0.335 2.040
Below I present an animated stacked bar graph of state level magnitude of Congressional representation, along with color-coded symbolization of the proportion who had ever held slaves in red, those who had not in green, and those whose status on slave ownership is unknown in yellow. The animation runs for about three minutes. The history of our Union is depicted here in four eras. The first era (1789-1863) begins with the first Congress and ends with the Emancipation Proclamation issued by president Abraham Lincoln in 1863, during the 37th Congress. This proclamation declared that all persons formerly enslaved would be henceforth free forever. The second era (1863-1923) spans the interval beginning with the Emancipation Proclamation and ending with the last Congress including any representative who had ever held slaves (the 67th). The third era (1923-1973) spans the years for which data are available wherein no Congressman had ever held slaves. The fourth, and modern, era, spans the years from the (putative) implementation of contemporary civil rights to the present.
Viewing the animation will take about three minutes. Each Congress is displayed in two frames (one per second) because each Congress comprises two years. The reader is invited to mindfully note (1) from the beginning how many years (about 75) and how large any red bars are present, indicating slave owner representation; (2) how many years since Emancipation (37th Congress; about 60 years) representation still reflected slave owner sensibilities; (3) how many years since then have elapsed in the data at hand when no slave holder sensibilities were officially represented (67th Congress; about 50 years); and (4) the years since Congressional representation is no longer reflected in the data at hand. The animation runs for about three minutes; each Congress is run for two frames, since each Congress comprises two years; and the entire sequence depicted comprises 184 years.
The stroke mortality data at hand are curated as a three-year average as of 2015 of state-level stroke mortality among adults aged greater than 35 per 100,000 population. These data were joined with slaveholder data to enable plotting with state-level historical slaveholder representation on the x-axis and state-level contemporary stroke mortality on the y-axis. Since there are many US states where historical representation by slave-owner Congressmen is zero or very low, this variable was log-transformed to render it mathematically accessible for plotting and to spread out these very low values for exploration.
Inspection of a static scatterplot of state-level contemporary stroke mortality on the y-axis by the logarithm of the proportion of each state’s historical Congressional representation who have ever held slaves on the x-axis, overlaid by a regression line with a default 95% confidence interval, yields a strong visual impression of a positive slope, indicating that as historical slaveholder representation increases at the state level, contemporary stroke mortality also increases.
Inspection of an interactive scatterplot programmed as above but with the added capability that the viewer may hover over each point and identify which state is represented, along with the values on the x and y axes, supports the above interpretation.
The reader will note that the slope and intercept of this model are 150.49 and 66.93, respectively, and that this model explains about 24% of the variability in contemporary stroke mortality among the data at hand. The explanatory variable is significant to four decimal places. Inspection of the diagnostic plots indicates that the linear model may not be the best fit for these data but let’s put this on the back burner for now.
##
## Call:
## lm(formula = Data_Value ~ logPropTrue, data = joinStroke)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.855 -5.490 1.360 8.257 19.029
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66.93 1.81 36.981 < 2e-16 ***
## logPropTrue 150.49 36.46 4.128 0.000139 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.995 on 50 degrees of freedom
## Multiple R-squared: 0.2542, Adjusted R-squared: 0.2392
## F-statistic: 17.04 on 1 and 50 DF, p-value: 0.0001387
## null device
## 1
Please consult my web app, available at https://jrhudy57.shinyapps.io/myfinalapp/. The app will render a scatterplot of state-level contemporary stroke mortality by log-transformed proportion of Congressional representation who have ever held slaves. By using the mouse to drag a rectangle around all the plotted points, the reader will note that the slope and intercept reproduce exactly the slope and intercept of the model fit1 reported above. If the reader wishes to exclude the states which were never represented by slaveowner Congressmen, he or she may sweep the left border of the rectangle slightly to the right to avoid the stack of points representing zero on the x-axis. The reader will then note that excluding the states never represented by slaveowner Congressmen, the association between those states with historical slaveowner representation and contemporary stroke mortality grows stronger.
I do not intend to teach anyone that stroke mortality is high in the American southeast; that fact is already well documented, and others are continuing to investigate it. What I really want people to understand is that historical exposure to chattel slavery, which has been illegal for 159 years, is hazardous to one’s contemporary health and that this historical exposure has somehow arisen from a mechanism which has become at least in part geographically operative. Stroke is a very complex disease, co-determined by many factors; however, my finding that this simple model explains 24% of the variability in stroke mortality in the data at hand is important. Any model of complex disease which includes only one candidate predictor is woefully mis-specified. This proportion of variability explained is unusual under these circumstances.
I have conducted an ordinary least squares regression analysis on a merged data set from two convenient sources. These skills were acquired over a decade of study and practice. I have also made it possible for the reader to fit this linear model and replicate my analysis by simply drawing a rectangle over a plot in a web application embedded in this document. This is a handy superpower which one can use to communicate with one’s boss, PhD committee, or promotion and tenure committee. I have done this using only open-source software, data in the public domain, and online resources available at a very modest cost. Most people interested in this kind of analysis will have access to SAS or SPSS but you may wish to work with those who do not. My work will continue to enrich this web app with further capabilities (subsetting by race and gender, comparison with other causes of mortality, and so on). I welcome your feedback as the work proceeds.
I happen to be a grandfather and a person of faith whose duty includes love of neighbor. Primarily for these two reasons, but including many others, I believe very deeply that racial reconciliation is in everyone’s best interest. Many would approach this by labeling our grandchildren as ‘oppressor’ or ‘oppressed’. I believe this to be a mistake; children will either learn to associate success with guilt or they will learn a convenient excuse not to function. In either case they will not learn to learn; they will learn to despise themselves. It is a sad fact that many racist notions are deeply embedded in the institutions that shape our common life. Anyone who doubts this or fails to teach it is simply making a different kind of mistake. There is no one still alive who is responsible for this sad heritage. Children must be taught this history or it may repeat itself. Let’s work together to strike the proper balance.